Garissa
RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset
Etori, Naome A., Gini, Maria L.
Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing
- Africa > Kenya > Nairobi City County > Nairobi (0.07)
- Africa > Kenya > Nairobi Province (0.06)
- Africa > Kenya > Mombasa County > Mombasa (0.05)
- (18 more...)
- Transportation > Passenger (1.00)
- Information Technology (1.00)
- Transportation > Ground > Road (0.93)
BART-SIMP: a novel framework for flexible spatial covariate modeling and prediction using Bayesian additive regression trees
Jiang, Alex Ziyu, Wakefield, Jon
Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate structure. Machine learning models have been suggested in the spatial context, allowing for spatial dependence in the residuals, but fail to provide reliable uncertainty estimates. In this paper, we investigate a novel combination of a Gaussian process spatial model and a Bayesian Additive Regression Tree (BART) model. The computational burden of the approach is reduced by combining Markov chain Monte Carlo (MCMC) with the Integrated Nested Laplace Approximation (INLA) technique. We study the performance of the method via simulations and use the model to predict anthropometric responses, collected via household cluster samples in Kenya.
- North America > United States (0.46)
- Africa > Kenya > Nairobi City County > Nairobi (0.04)
- Africa > Kenya > Mombasa County > Mombasa (0.04)
- (25 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference
Del Corro, Luciano, Del Giorno, Allie, Agarwal, Sahaj, Yu, Bin, Awadallah, Ahmed, Mukherjee, Subhabrata
Autoregressive large language models (LLMs) have made remarkable progress in various natural language generation tasks. However, they incur high computation cost and latency resulting from the autoregressive token-by-token generation. To address this issue, several approaches have been proposed to reduce computational cost using early-exit strategies. These strategies enable faster text generation using reduced computation without applying the full computation graph to each token. While existing token-level early exit methods show promising results for online inference, they cannot be readily applied for batch inferencing and Key-Value caching. This is because they have to wait until the last token in a batch exits before they can stop computing. This severely limits the practical application of such techniques. In this paper, we propose a simple and effective token-level early exit method, SkipDecode, designed to work seamlessly with batch inferencing and KV caching. It overcomes prior constraints by setting up a singular exit point for every token in a batch at each sequence position. It also guarantees a monotonic decrease in exit points, thereby eliminating the need to recompute KV Caches for preceding tokens. Rather than terminating computation prematurely as in prior works, our approach bypasses lower to middle layers, devoting most of the computational resources to upper layers, allowing later tokens to benefit from the compute expenditure by earlier tokens. Our experimental results show that SkipDecode can obtain 2x to 5x inference speedups with negligible regression across a variety of tasks. This is achieved using OPT models of 1.3 billion and 6.7 billion parameters, all the while being directly compatible with batching and KV caching optimization techniques.
- Africa > Kenya > Garissa County > Garissa (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > Dominican Republic (0.04)
- (5 more...)
PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization
Liu, Xiaochen, Gao, Yang, Bai, Yu, Li, Jiawei, Hu, Yinan, Huang, Heyan, Chen, Boxing
Few-shot abstractive summarization has become a challenging task in natural language generation. To support it, we designed a novel soft prompts architecture coupled with a prompt pre-training plus fine-tuning paradigm that is effective and tunes only extremely light parameters. The soft prompts include continuous input embeddings across an encoder and a decoder to fit the structure of the generation models. Importantly, a novel inner-prompt placed in the text is introduced to capture document-level information. The aim is to devote attention to understanding the document that better prompts the model to generate document-related content. The first step in the summarization procedure is to conduct prompt pre-training with self-supervised pseudo-data. This teaches the model basic summarizing capabilities. The model is then fine-tuned with few-shot examples. Experimental results on the CNN/DailyMail and XSum datasets show that our method, with only 0.1% of the parameters, outperforms full-model tuning where all model parameters are tuned. It also surpasses Prompt Tuning by a large margin and delivers competitive results against Prefix-Tuning with 3% of the parameters.
- Africa > Kenya > Nairobi City County > Nairobi (0.05)
- Europe > Spain > Galicia > Madrid (0.05)
- Africa > Kenya > Garissa County > Garissa (0.04)
- (9 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Education (0.94)
- Law Enforcement & Public Safety > Terrorism (0.68)
Apple and Malala Fund partnership takes major new step into Latin America
How do you get every single girl a full 12 years of quality education? That's the question at the heart of the Malala Fund, the organisation set up by Malala Yousafzai, the young Nobel Prize winner. And she wants to provide this education in parts of the world where it can't be taken for granted. Luckily, she has a powerful ally. In January, Apple revealed a tie-up with Malala Fund as part of the initial goal of getting 100,000 girls into education in Afghanistan, Pakistan, Lebanon, Turkey and Nigeria. But today it has been announced that the collaboration is expanding to Latin America. This expansion means grants will be offered to advocates in Brazil, who will join the Malala Fund's network of so-called Gulmakai Champions.
- North America > Central America (0.60)
- Africa > Malawi (0.29)
- Asia > Pakistan (0.26)
- (19 more...)
Using Search Queries to Understand Health Information Needs in Africa
Abebe, Rediet, Hill, Shawndra, Vaughan, Jennifer Wortman, Small, Peter M., Schwartz, H. Andrew
The lack of comprehensive, high-quality health data in developing nations creates a roadblock for combating the impacts of disease. One key challenge is understanding the health information needs of people in these nations. Without understanding people's everyday needs, concerns, and misconceptions, health organizations and policymakers lack the ability to effectively target education and programming efforts. In this paper, we propose a bottom-up approach that uses search data from individuals to uncover and gain insight into health information needs in Africa. We analyze Bing searches related to HIV/AIDS, malaria, and tuberculosis from all 54 African nations. For each disease, we automatically derive a set of common search themes or topics, revealing a wide-spread interest in various types of information, including disease symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in natural cures, and other topics that may be hard to uncover through traditional surveys. We expose the different patterns that emerge in health information needs by demographic groups (age and sex) and country. We also uncover discrepancies in the quality of content returned by search engines to users by topic. Combined, our results suggest that search data can help illuminate health information needs in Africa and inform discussions on health policy and targeted education efforts both on- and offline.
- Africa > Nigeria (0.05)
- Africa > Botswana (0.05)
- Africa > West Africa (0.04)
- (18 more...)
After Niger attack, a look at clandestine jihadists posing a growing danger to U.S. forces in Africa
As America increases its military footprint in some of Africa's most dangerous trouble spots, confronting extremist affiliates of Al Qaeda and Islamic State, the risk of intelligence failures and more combat deaths is mounting. U.S. special forces who accompanied Niger's military at a meeting of village leaders in Tongo Tongo on Oct. 4 were working in the country's treacherous western borderlands, a region of shifting tribal allegiances, opaque motives and ethnic grudges going back decades, all feeding into a growing jihadist problem. Four Americans and five Nigerian troops died after leaving Tongo Tongo and being ambushed and heavily outgunned by fighters armed with automatic weapons and rocket-propelled grenades. The militants are believed to be from a Malian-led militia, the Islamic State in the Greater Sahel, which declared allegiance to the overall militant organization in 2015. One error appears to have been downplaying the danger.
- Africa > Mali (0.06)
- Africa > Nigeria (0.05)
- North America > United States > Maine (0.04)
- (14 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- North America > United States (0.92)
- Africa > Middle East > Somalia (0.79)
- Africa > Kenya > Garissa County > Garissa (0.43)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.92)
STRIKE AGAINST TERROR US drone hit 'most likely' killed al-Shabab chief
A U.S. drone strike in Somalia "most likely" killed Hassan Ali Dhoore, a senior leader of the terror group al-Shabab who had planned attacks that killed three Americans overseas, a U.S. official confirmed to Fox News Friday. Dhoore was riding in a vehicle with two other al-Shabab members Thursday evening when the strike took place about 20 miles south of Jilib in southern Somalia, according to a senior U.S. defense official. The Pentagon had been watching him off and on for a long time, the senior official adds, saying the Somali government was involved in sharing information that led to this strike. U.S. officials say Dhoore helped facilitate a deadly Christmas Day 2014 attack at a Somali airport and a March 2015 attack at the Maka Al-Mukarramah Hotel, both in Mogadishu. U.S. citizens were among those killed in the two attacks, the officials said.
- Africa > Middle East > Somalia > Banaadir > Mogadishu (0.27)
- Africa > Kenya > Nairobi City County > Nairobi (0.07)
- Africa > Kenya > Garissa County > Garissa (0.07)
- Information Technology > Robotics & Automation (0.63)
- Government > Military (0.63)
- Government > Regional Government > North America Government > United States Government (0.51)